Variable Selection is Hard

نویسندگان

  • Dean P. Foster
  • Howard J. Karloff
  • Justin Thaler
چکیده

Variable selection for sparse linear regression is the problem of finding, given anm×pmatrixB and a target vector y, a sparse vector x such thatBx approximately equals y. Assuming a standard complexity hypothesis, we show that no polynomial-time algorithm can find a k′-sparse x with ||Bx − y|| ≤ h(m, p), where k′ = k ·2log p and h(m, p) ≤ pC1m1−C2 , where δ > 0, C1 > 0, C2 > 0 are arbitrary. This is true even under the promise that there is an unknown k-sparse vector x∗ satisfying Bx∗ = y. We prove a similar result for a statistical version of the problem in which the data are corrupted by noise. To the authors’ knowledge, these are the first hardness results for sparse regression that apply when the algorithm simultaneously has k′ > k and h(m, p) > 0.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An improved genetic algorithm for multidimensional optimization of precedence-constrained production planning and scheduling

Integration of production planning and scheduling is a class of problems commonly found in manufacturing industry. This class of problems associated with precedence constraint has been previously modeled and optimized by the authors, in which, it requires a multidimensional optimization at the same time: what to make, how many to make, where to make and the order to make. It is a combinatorial,...

متن کامل

An Overview of the New Feature Selection Methods in Finite Mixture of Regression Models

Variable (feature) selection has attracted much attention in contemporary statistical learning and recent scientific research. This is mainly due to the rapid advancement in modern technology that allows scientists to collect data of unprecedented size and complexity. One type of statistical problem in such applications is concerned with modeling an output variable as a function of a sma...

متن کامل

Using distance covariance for improved variable selection with application to learning genetic risk models.

Variable selection is of increasing importance to address the difficulties of high dimensionality in many scientific areas. In this paper, we demonstrate a property for distance covariance, which is incorporated in a novel feature screening procedure together with the use of distance correlation. The approach makes no distributional assumptions for the variables and does not require the specifi...

متن کامل

Killing them softly: managing pathogen polymorphism and virulence in spatially variable environments

Understanding why pathogen populations are genetically variable is vital because genetic variation fuels evolution, which often hampers disease control efforts. Here I argue that classical models of evolution in spatially variable environments - specifically, models of hard and soft selection - provide a useful framework to understand the maintenance of pathogen polymorphism and the evolution o...

متن کامل

The Florida State University College of Arts and Sciences Theories on Group Variable Selection in Multivariate Regression Models

We study group variable selection on multivariate regression model. Group variable selection is selecting the non-zero rows of coefficient matrix, since there are multiple response variables and thus if one predictor is irrelevant to estimation then the corresponding row must be zero. In a high dimensional setup, shrinkage estimation methods are applicable and guarantee smaller MSE than OLS acc...

متن کامل

Geostatistical simulation of RQD variable using investigation of spatial continuity between Quaternary alluvial layer and hard rock of Gohar-Zamin mine to the determination of permeable zones

In this research, a sequential Gaussian simulation method has been used to determine the permeable zones in the hard-rock aquifer of the Gohr-Zamin open pit mine. For this purpose, 4946 RQD data from eighty-seven exploratory boreholes was used and exploratory-spatial data analysis of these data was performed using the preliminary statistics, location maps, histograms and variograms. Results of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015